========================================================


Initial Set-up

I chose to explore the financial contributions from Georgia to the election campaigns. What differences can we find between Rebublican and Democratic supporters?

setwd('~/Documents/datasets')
ga_obama <-read.csv('P80003338-GA.csv')

What are some things that you notice right away?

The first file I downloaded only had 31 observations for one candidate, Ted Cruz. I went back to the website to try and understand what data was available. I then downloaded a dataset for Georgia contributions to President Obama’s 2012 campaign. After trying to load the csv file, I got an error indicating that the first column does not have unique row names. Checked the documentation on www.fec.gov and saw that “tran_id” transaction id is supposed to be unique for the dataset. Moved that to first column and then was able to read in the file.

str(ga_obama)
## 'data.frame':    96601 obs. of  18 variables:
##  $ tran_id          : Factor w/ 96601 levels "C10968531","C10968533",..: 1798 1153 1314 1079 1015 1620 1437 1128 2003 1290 ...
##  $ cmte_id          : Factor w/ 1 level "C00431445": 1 1 1 1 1 1 1 1 1 1 ...
##  $ cand_id          : Factor w/ 1 level "P80003338": 1 1 1 1 1 1 1 1 1 1 ...
##  $ cand_nm          : Factor w/ 1 level "Obama, Barack": 1 1 1 1 1 1 1 1 1 1 ...
##  $ contbr_nm        : Factor w/ 19006 levels "AARON, BILLYE S",..: 2971 12501 13429 9836 10432 3207 14808 6476 5631 18533 ...
##  $ contbr_city      : Factor w/ 573 levels "A","AATLANTA",..: 32 32 42 42 32 45 32 32 289 196 ...
##  $ contbr_st        : Factor w/ 1 level "GA": 1 1 1 1 1 1 1 1 1 1 ...
##  $ contbr_zip       : int  30311 303083347 30168 30168 303191018 300021562 303443231 303094147 300461264 302158012 ...
##  $ contbr_employer  : Factor w/ 8304 levels "","(SELF) ROBERTSON'S DECORATING CENTER I",..: 5982 3859 5334 5982 5982 5982 3859 6389 2277 5982 ...
##  $ contbr_occupation: Factor w/ 5170 levels "","_","(R) RT",..: 4390 2227 1734 3923 3923 3923 3923 1460 346 4733 ...
##  $ contb_receipt_amt: num  15 250 25 112 25 50 50 250 50 10 ...
##  $ contb_receipt_dt : Factor w/ 613 levels "1-Apr-12","1-Aug-11",..: 202 389 556 291 88 612 345 369 400 495 ...
##  $ receipt_desc     : Factor w/ 2 levels "","Refund": 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_cd          : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_text        : Factor w/ 22 levels "","*","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ form_tp          : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ file_num         : int  756218 756218 756218 756218 756218 756218 756218 756218 756218 756218 ...
##  $ election_tp      : Factor w/ 5 levels "G2008","G2012",..: 5 5 5 5 5 5 5 5 5 5 ...
summary(ga_obama)
##       tran_id           cmte_id           cand_id     
##  C10968531:    1   C00431445:96601   P80003338:96601  
##  C10968533:    1                                      
##  C10970191:    1                                      
##  C10970429:    1                                      
##  C10971061:    1                                      
##  C10972400:    1                                      
##  (Other)  :96595                                      
##           cand_nm                   contbr_nm             contbr_city   
##  Obama, Barack:96601   CORNUTT, SUSAN    :  159   ATLANTA       :27543  
##                        SOUTHERLAND, LINDA:  150   DECATUR       : 5777  
##                        SHASH, AMIR       :  144   MARIETTA      : 4323  
##                        LAMB, ALYSE       :  131   ATHENS        : 2521  
##                        CAUTHEN, GEORGE   :  119   STONE MOUNTAIN: 2366  
##                        GOODELL, JANNAH   :  118   ALPHARETTA    : 2351  
##                        (Other)           :95780   (Other)       :51720  
##  contbr_st    contbr_zip                     contbr_employer 
##  GA:96601   Min.   :       40   RETIRED              :17454  
##             1st Qu.:    30312   SELF-EMPLOYED        : 8410  
##             Median :300346461   NOT EMPLOYED         : 8299  
##             Mean   :170280434   INFORMATION REQUESTED: 2615  
##             3rd Qu.:303095111   EMORY UNIVERSITY     : 1104  
##             Max.   :912142739   (Other)              :58713  
##                                 NA's                 :    6  
##              contbr_occupation contb_receipt_amt   contb_receipt_dt
##  RETIRED              :18751   Min.   :-5000.00   17-Oct-12: 3153  
##  ATTORNEY             : 3337   1st Qu.:   19.00   2-Nov-12 : 2758  
##  PHYSICIAN            : 2518   Median :   35.00   31-Aug-12: 1986  
##  INFORMATION REQUESTED: 2371   Mean   :   99.27   31-Oct-12: 1882  
##  HOMEMAKER            : 2051   3rd Qu.:  100.00   23-Oct-12: 1836  
##  (Other)              :67572   Max.   : 5000.00   28-Sep-12: 1662  
##  NA's                 :    1                      (Other)  :83324  
##  receipt_desc   memo_cd  
##        :95852    :77835  
##  Refund:  749   X:18766  
##                          
##                          
##                          
##                          
##                          
##                                      memo_text      form_tp     
##                                           :77711   SA17A:77102  
##  * OBAMA VICTORY FUND 2012                :18638   SA18 :18750  
##  *                                        :  122   SB28A:  749  
##  * EARMARKED CONTRIBUTION: SEE BELOW      :   82                
##  EXCESSIVE CONTRIBUTION REFUNDED OCT. 2012:   11                
##  EXCESSIVE CONTRIB. REFUNDED SEPT. 2012   :    9                
##  (Other)                                  :   28                
##     file_num      election_tp  
##  Min.   :756214   G2008:    3  
##  1st Qu.:810684   G2012:52501  
##  Median :821325   O2012:   89  
##  Mean   :820035   P2008:    8  
##  3rd Qu.:840327   P2012:44000  
##  Max.   :853328                
## 
names(ga_obama)
##  [1] "tran_id"           "cmte_id"           "cand_id"          
##  [4] "cand_nm"           "contbr_nm"         "contbr_city"      
##  [7] "contbr_st"         "contbr_zip"        "contbr_employer"  
## [10] "contbr_occupation" "contb_receipt_amt" "contb_receipt_dt" 
## [13] "receipt_desc"      "memo_cd"           "memo_text"        
## [16] "form_tp"           "file_num"          "election_tp"
library(ggplot2)

Univariate plots section - distribution of contribution amounts among supporters.

First, I wanted to understand the number of people donating to President Obama’s campaign at the different amount levels.

qplot(x=contb_receipt_amt, data = ga_obama) 
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(x=contb_receipt_amt, data = ga_obama, binwidth=10, xlab="Amount of Contribution",ylab="Number of contributors") +
   scale_x_continuous(limits=c(0,2000), breaks=seq(0,2000,100))

I noticed that most people contributed less than $150- narrowed scale down to see this more closely:

qplot(x=contb_receipt_amt, data = ga_obama, binwidth=10, xlab="Amount of Contribution",ylab="Number of contributors") +
   scale_x_continuous(limits=c(0,275), breaks=seq(0,275,25))

qplot(x=contb_receipt_amt, y= ..count../sum(..count..), data = ga_obama, 
      xlab="Amount of Contribution",
      ylab="Proportion of users who contributed that amount",
      geom='freqpoly') 
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

      scale_x_continuous()
## continuous_scale(aesthetics = c("x", "xmin", "xmax", "xend", 
##     "xintercept"), scale_name = "position_c", palette = identity, 
##     expand = expand, guide = "none")
qplot(x=contb_receipt_amt, y= ..count../sum(..count..), data = ga_obama, 
      xlab="Amount of Contribution",
      ylab="Proportion of users who contributed that amount",
      geom='freqpoly') +
      scale_x_continuous(limits=c(-500,500), breaks=seq(-500,500, 100))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 2 rows containing missing values (geom_path).

These last two plots really show that the majority of President Obama’s supporters contributed at less than $300 levels. But how does the Republican candidate compare? Downloaded the contributions from Georgia to 2012 Mitt Romney campaign from the FEC website.

setwd('~/Documents/datasets')
ga_mitt <-read.csv('P80003353-GA.csv') 
str(ga_mitt)
## 'data.frame':    52891 obs. of  18 variables:
##  $ tran_id          : Factor w/ 30037 levels "SA17.1000013",..: 13262 13128 13606 13793 13666 13423 12913 13723 13789 13722 ...
##  $ cand_id          : Factor w/ 1 level "P80003353": 1 1 1 1 1 1 1 1 1 1 ...
##  $ cand_nm          : Factor w/ 1 level "Romney, Mitt": 1 1 1 1 1 1 1 1 1 1 ...
##  $ contbr_nm        : Factor w/ 14511 levels "21ST CENTURY MAJORITY FUND",..: 545 546 557 1688 2659 2672 2679 594 1714 1758 ...
##  $ contbr_city      : Factor w/ 525 levels "ABBEVILLE","ACCATUR",..: 29 29 135 29 29 29 29 29 13 13 ...
##  $ contbr_st        : Factor w/ 1 level "GA": 1 1 1 1 1 1 1 1 1 1 ...
##  $ contbr_zip       : int  303062621 303051038 307220624 303054018 303424408 303264229 303051352 303395362 300044546 300225180 ...
##  $ contbr_employer  : Factor w/ 6313 levels "","(RETIRED BANKER)",..: 265 4739 4739 1036 596 1365 1368 3879 2495 2755 ...
##  $ contbr_occupation: Factor w/ 2637 levels "","11B","A-320 INSTRUCTOR PILOT",..: 136 2000 2000 2529 1632 1749 277 136 1171 1088 ...
##  $ contb_receipt_amt: num  250 2500 1000 250 2000 1000 2500 1000 2500 2500 ...
##  $ contb_receipt_dt : Factor w/ 511 levels "1-Aug-11","1-Aug-12",..: 134 454 8 390 80 267 354 249 390 249 ...
##  $ receipt_desc     : Factor w/ 12 levels "","ATTRIBUTION TO PARTNERS REQUESTED / REDESIGNATION REQUESTED",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_cd          : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_text        : Factor w/ 51 levels "","ATTRIBUTION TO PARTNERS REQUESTED",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ form_tp          : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ file_num         : int  760248 760248 760248 760248 760248 760248 760248 760248 760248 760248 ...
##  $ tran_id.1        : Factor w/ 30037 levels "SA17.1000013",..: 13262 13128 13606 13793 13666 13423 12913 13723 13789 13722 ...
##  $ election_tp      : Factor w/ 2 levels "G2012","P2012": 2 2 2 2 2 2 2 2 2 2 ...
qplot(x=contb_receipt_amt, data = ga_mitt, binwidth=10, xlab="Amount of Contribution",ylab="Mitt Romney") +
   scale_x_continuous(limits=c(0,3000), breaks=seq(0,3000,500))

First, note that there are 96601 contributions from Georgia to Present Obama’s campaign, and only 52891 contributions to Mitt Romney’s campaign from Georgia. Let’s compare the number of contributions at different levels, with same scales;

q1 <- qplot(x=contb_receipt_amt, data = ga_mitt, binwidth=10, xlab="Amount of Contribution",ylab="Mitt Romney") +
   scale_x_continuous(limits=c(0,3000), breaks=seq(0,3000,500))
q2 <- qplot(x=contb_receipt_amt, data = ga_obama, binwidth=10, xlab="Amount of Contribution",ylab="President Obama") +
   scale_x_continuous(limits=c(0,3000), breaks=seq(0,3000,500))

library(gridExtra)
## Loading required package: grid
grid.arrange(q1,q2, ncol=1) 

Comparing these two graphs, it is easy to see how many more people contributed to President Obama’s campaign then Mitt Romney’s. Let’s now compare the distribution of amounts proportionately:

q3 <- qplot(x=contb_receipt_amt, y= ..count../sum(..count..), data = ga_obama, 
      xlab="Amount of Contribution",
      ylab="President Obama",
      geom='freqpoly') +
      scale_x_continuous(limits=c(-100,2600), breaks=seq(-100,2600, 200))
q4 <- qplot(x=contb_receipt_amt, y= ..count../sum(..count..), data = ga_mitt, 
      xlab="Amount of Contribution",
      ylab="Mitt Romney",
      geom='freqpoly') +
      scale_x_continuous(limits=c(-100,2600), breaks=seq(-100,2600, 200))
grid.arrange(q3,q4, ncol=1)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).

#g <- arrangeGrob(q3,q4, ncol=1) 
#ggsave(file="../P3/compare_amounts.pdf", g) #saves g

One can see that the bulk of President Obama’s supporters contributed less than $100, while contributions to Mitt Romney’s campaigns had spikes around $500, $100 and $2500.


Bivariate plots section - Contribution Amounts vs City

Next, let’s take a look at contribution amounts per city.

qplot(contbr_city,contb_receipt_amt, data = ga_obama)

Immediately noticeable in this first city vs contribution plot is that there are negative contributions plotted. What does this mean? I went back to website to understand. Noticed that there was an attribute “receipt desc”. Plotted that with the amount contributed. Almost all of the negative contributions had a description of “Refund” while the positive contributions had a blank refund description, which explained the negative amounts. Since these contributions don’t represent actual contributions, will leave these data points out of my dataset.

qplot(contb_receipt_amt,receipt_desc, data = ga_obama)

ga_obama_positive = subset(ga_obama, contb_receipt_amt >0)
qplot(contbr_city,contb_receipt_amt, data = ga_obama_positive)

This scatterplot is very hard to read, as the individual cities are not labeled and the data points are clumped together on the x-axis. You can see that most of the amounts were under $500, with clear lines at $1000, $1500, $2000, and $2500. Going to build a new data set containing averages to see more details.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
contbr_city_groups <- group_by(ga_obama_positive, contbr_city)
ga_obama.contrib_by_city <- summarise(contbr_city_groups, 
                       contrib_mean = mean(contb_receipt_amt), 
                       contrib_median = median(contb_receipt_amt),
                       n = n()) 
ga_obama.contrib_by_city <- arrange(ga_obama.contrib_by_city, contbr_city) 

head(ga_obama.contrib_by_city,30)
## Source: local data frame [30 x 4]
## 
##    contbr_city contrib_mean contrib_median   n
## 1            A    175.00000            175   2
## 2     AATLANTA      5.00000              5   1
## 3      ACWORTH     73.16497             35 591
## 4  ADAIRSVILLE     95.14286             25  14
## 5         ADEL    116.60870            100  23
## 6        AILEY    533.33333            300   3
## 7      ALAPAHA     50.16000             50  25
## 8       ALBANY    101.40471             50 433
## 9     ALBERTON    150.00000            100   3
## 10  ALEXANDRIA     95.00000            100   5
## ..         ...          ...            ... ...
ggplot(aes(x=contbr_city, y = contrib_mean),data = ga_obama.contrib_by_city ) + 
  geom_point() + scale_x_discrete()

This plot is easier to read, but I would like to make it wider and have all the different cities listed for comparison. Lets get a set of cities to look at that have n (number of contributors) > 200. Also rotated city labels so they could be read.

ggplot(aes(x=contbr_city, y = contrib_mean),data = subset(ga_obama.contrib_by_city, n>200) ) + 
  geom_point() + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

Adding color, n >200 for readability.

ggplot(aes(x=contbr_city, y = contrib_mean),data = subset(ga_obama.contrib_by_city, n>200) ) + 
  geom_point(color="blue") + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

### Now onto the Republicans! Went back to the FEC website and found the Georgia contributions to Mitt Romney in 2012. Made the same adjustments using the transaction id and read the file in.

qplot(contbr_city,contb_receipt_amt, data = ga_mitt)

Again, noticed negative contributions, filtered those out. Also need to group by city and take average as done for the Obama data.

qplot(contbr_city,contb_receipt_amt, data = ga_mitt)

ga_mitt_positive = subset(ga_mitt, contb_receipt_amt >0)
qplot(contbr_city,contb_receipt_amt, data = ga_mitt_positive)

mitt_contbr_city_groups <- group_by(ga_mitt_positive, contbr_city)
ga_mitt.contrib_by_city <- summarise(mitt_contbr_city_groups, 
                       contrib_mean = mean(contb_receipt_amt), 
                       contrib_median = median(contb_receipt_amt),
                       n = n()) 
ga_mitt.contrib_by_city <- arrange(ga_mitt.contrib_by_city, contbr_city) 


ggplot(aes(x=contbr_city, y = contrib_mean),data = subset(ga_mitt.contrib_by_city, n>200) ) + 
  geom_point(color="red") + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

Plot both together, and compare city with donation amount by party. Added red color for Mitt Romney contributions

library(gridExtra)
obamap1 <- ggplot(aes(x=contbr_city, y = contrib_mean),data = subset(ga_obama.contrib_by_city, n>200) ) + 
  geom_point(color="blue") + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
mittp2 <- ggplot(aes(x=contbr_city, y = contrib_mean),data = subset(ga_mitt.contrib_by_city, n>200) ) + 
  geom_point(color="red") + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

grid.arrange(obamap1,mittp2, ncol=1) 

Can easily see that average contributions by city to Mitt Romney’s campaign are much greater than the average contributions by city to President Obama’s campaign. Also, you can see that the number of cities that have more than 200 contributors is greater for President Oabam then Mitt Romney, which matches our finding above for overall number of contributors. I would like to see this on one plot for easier comparison. Created two new data sets and binded them together using rbind, then plotted on same axes.

visual1= data.frame(subset(ga_obama.contrib_by_city, n>200))
visual2= data.frame(subset(ga_mitt.contrib_by_city, n>200))
visual1$group <- "obama"
visual2$group <- "mitt"
visual12 <- rbind(visual1, visual2)

ggplot(visual12, aes(x=contbr_city, y=contrib_mean, group=group, col=group, fill=group)) +
      geom_point() + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +  scale_y_continuous(limits=c(0, 1500),breaks=seq(0,1500,200)) 

mean_contrib_city_both <- ggplot(visual12, aes(x=contbr_city, y=contrib_mean, group=group, col=group, fill=group)) +
      geom_point() + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +  scale_y_continuous(limits=c(0, 1500),breaks=seq(0,1500,200)) 

#ggsave(mean_contrib_city_both,file="../P3/both.png",width=15,height=3)

Across the board, average Republican contributions per city are higher than Democratic ones.


Multivariate plots section - looking at retired vs working donors, and those that live in Athens.

Next, I would like to compare the contributions of retirees to both campaigns. How to get this data?

ga_obama$retired <- "N"                   
ga_obama$retired[grepl("RETIRED", ga_obama$contbr_occupation) == TRUE] <- "Y"
table(ga_obama$retired)
## 
##     N     Y 
## 76471 20130

In this table, there is an occupation “RETIRED” plus many occupations that contain the word “RETIRED”. Decided to create a new variable ‘retired’. 26.3% of Obama donors are retired.

ga_mitt$retired <- "N"
ga_mitt$retired[grepl("RETIRED", ga_mitt$contbr_occupation) == TRUE] <- "Y"
table(ga_mitt$retired)
## 
##     N     Y 
## 39741 13150

Looks like a slighty higher percentage, 33%, of Mitt Romney donors are retired. Let’s first look at contribution amounts for President Obama by retired vs working people.

ggplot(aes(x=contbr_city, y = contb_receipt_amt),data = ga_obama ) + 
  geom_point(aes(color=retired), stat='summary', fun.y=median) + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

This plot had too many city points to be useful, and city is not providing too much information in the analysis.

p1 <- qplot(x=contb_receipt_amt, data = subset(ga_obama,retired=="Y"), binwidth=10, xlab="Amount of Contribution",ylab="Number of retired contributors") +
   scale_x_continuous(limits=c(0,2000), breaks=seq(0,2000,100)) 
p2 <- qplot(x=contb_receipt_amt, data = subset(ga_obama,retired=="N"), binwidth=10, xlab="Amount of Contribution",ylab="Number of workicontributors") +
   scale_x_continuous(limits=c(0,2000), breaks=seq(0,2000,100)) 
grid.arrange(p1,p2, ncol=1)  

p3 <- qplot(x=contb_receipt_amt, y= ..count../sum(..count..), data = subset(ga_obama,retired=="Y"), 
      xlab="Amount of Contribution",
      ylab="President Obama, Retired",
      geom='freqpoly') +
      scale_x_continuous(limits=c(-100,2600), breaks=seq(-100,2600, 200))
p4 <- qplot(x=contb_receipt_amt, y= ..count../sum(..count..), data = subset(ga_obama,retired=="N"), 
      xlab="Amount of Contribution",
       ylab="President Obama, Working",
      geom='freqpoly') +
      scale_x_continuous(limits=c(-100,2600), breaks=seq(-100,2600, 200))
grid.arrange(p3,p4, ncol=1)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).

This last graph shows the contribution amount distribution was very similar between retired and non retired persons in President Obama’s campaign. Now let’s look at Mitt Romney’s campaign:

p5 <- qplot(x=contb_receipt_amt, y= ..count../sum(..count..), data = subset(ga_mitt,retired=="Y"), 
      xlab="Amount of Contribution",
      ylab="Mitt Romney, Retired",
      geom='freqpoly') +
      scale_x_continuous(limits=c(-100,2600), breaks=seq(-100,2600, 200))
p6 <- qplot(x=contb_receipt_amt, y= ..count../sum(..count..), data = subset(ga_mitt,retired=="N"), 
      xlab="Amount of Contribution",
       ylab="Mitt Romney, Working",
      geom='freqpoly') +
      scale_x_continuous(limits=c(-100,2600), breaks=seq(-100,2600, 200))
grid.arrange(p5,p6, ncol=1)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).

This last graph shows the contribution amount distribution was similar between retired and non retired persons in Mitt Romney’s campaign, although more retired people were giving at the lower levels than working people.

Let’s finally analyze all occupations, not just retired/working. I am going to narrow this down to my town, Athens (home of University of Georgia). This gave us 2521 contributions.

ga_obama_athens= subset(ga_obama, contbr_city=="ATHENS")
library(dplyr)
occupation_groups <- group_by(ga_obama_athens, contbr_occupation)
ga_obama_athens.contrib_by_occ <- summarise(occupation_groups, 
                       contrib_mean = mean(contb_receipt_amt), 
                       contrib_median = median(contb_receipt_amt),
                       n = n()) 
ga_obama_athens.contrib_by_occ <- arrange(ga_obama_athens.contrib_by_occ, contbr_occupation) 


ggplot(aes(x=contbr_occupation, y = contrib_mean),data = subset(ga_obama_athens.contrib_by_occ, n>10) ) + 
  geom_point(color="blue") + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

This last plot was very interesting to me. When I first narrowed the number of contributors to over 100 for an occupation, all that was plotted was “PROFESSOR” and “RETIRED”“, which is exactly the case when you live in a college town. I then let the number of contributors with that occupation go down to 5 to see all the different occupations. Many of these are associated with UGA. It was also interesting that the occupation of the highest amount was”HOMEMAKER“.

Athens, Georgia is known as a “blue” town in a “red” state. Overall in Georgia, 64.6 % of contributions when to Obama, lets see percentage in Athens:

ga_mitt_athens= subset(ga_mitt, contbr_city=="ATHENS")

This gives us only 456 contributions to Mitt Romney’s campaign from Athens. So for Athens, the percentage of contributions to Obama is 84.6% compared to 64.6% for Georgia overall.

Final Plots and Summary

Plot One

grid.arrange(q3,q4, ncol=1)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).

Plot One Description This plot really shows the fact that contributions to President Obama’s campaign at lower levels than Mitt Romney.

Plot Two

ggplot(visual12, aes(x=contbr_city, y=contrib_mean, group=group, col=group, fill=group)) +
      geom_point() + scale_x_discrete() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +  scale_y_continuous(limits=c(0, 1500),breaks=seq(0,1500,200)) 

Plot Two Description Across the board, average Republican contributions per city are higher than Democratic ones.

Plot Three

grid.arrange(p3,p4, ncol=1)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Removed 3 rows containing missing values (geom_path).

Plot Three Description Retired persons gave at the same level as working persons in President Obama’s campaign.

Reflection

I started with 96,601 contributions to President Obama and 52,891 contributions to Mitt Romney in the 2012 presidential campaign from Georgia. The data showed that although the contribution amounts were lower to President Obama’s campaign, more people contributed at these lower levels. I looked at this by city and by occupation status (retired or working).
Then, I looked at Athens to see the distibution of contributions across occupations. I also confirmed the idea that Athens is more Democratic then Republican compared to Georgia. I would be interested to compare contributions by factors as age, income level, religious affiliation, etc. I was wishing the dataset had some of these attributes.